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rule Enfal_Malware_Backdoor { 

meta: 
description = "Generic Rule to detect the Enfal Malware" 
author = "Florian Roth" 
date = "2015/02/10" 
super_rule = 1 
hashO = "6d484daba3927 fc0744b1bbd7981a56ebef95790" 
hashl = "d4071272cclbf944e3867db299b3f5dce126f82b" 
hash2 "6c7c8b804cc76e2c208c6e3b6453cb134d01fa4i" 
score 60 

strings: 
$x1 = "Micorsoft Corportation" fullword wide 
$x2 "IM Monnitor Service" fullword wide 


$al = “imemonsvc.d1lL" fullword wide 
$a2 = “iphlpsvc.tmp" fullword 
$a3 "{53A4988C-F91F-4054-9076-220AC5ECO3F3}" fullword 


$s1 “urlmon" fullword 


$s2 = "Registered trademarks and service marks are the property of their" wide 


$s3 = "XpsUnregisterServer" fullword 
$s4 = "XpsRegisterServer" fullword 
condition: 
uint16(0) and 
( 
( 1 of ($x*) ) or 
( 2 of ($a*) and all of ($s*) ) 


Months ago | wrote a blog article on “How to write simple but sound Yara rules“. Since then 
the mentioned techniques and tools have improved. I'd like to give you a brief update on 
certain Yara features that | frequently use and tools that | use to generate and test my rules. 


Handle Very Specific Strings Differently 


In the past | was glad to see very specific strings in samples and sometimes used these 
strings as the only indicator for detection. E.g. whenever I’ve found a certain typo in the PE 
header fields like “Micorsoft Corportation” | cheered and thought that this would make a great 
signature. But — and | have to admit that now — this only makes a nice signature. Great 
signatures require not only to match on a certain sample in the most condensed way but 
aims to match on similar samples created by the same author or group. 

Look at the following rule: 
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rule Enfal_ Malware_Backdoor { 

meta: 
description = "Generic Rule to detect the Enfal Malware" 
author = "Florian Roth" 
date = "2015/02/10" 
super_rule = 1 
hashO = "6d484daba3927fc0744b1bbd7981a56ebef95790" 
hash1 = "d4071272cc1bf944e3867db299b3f5dce1 26f82b" 
hash2 = "6c7c8b804cc76e2c208c6e3b6453cb134d01fa41" 
score = 60 

strings: 
$x1 = "Micorsoft Corportation" fullword wide 
$x2 = "IM Monnitor Service" fullword wide 
$a1 = "imemonsve.dll" fullword wide 
$a2 = "iphlpsvc.tmp" fullword 
$a3 = "{53A4988C-F91F-4054-9076-220AC5EC03F3}" fullword 
$s1 = "urlmon" fullword 
$s2 = "Registered trademarks and service marks are the property of their" wide 
$s3 = "XpsUnregisterServer" fullword 
$s4 = "XpsRegisterServer" fullword 

condition: 
uint16(0) == Ox5A4D and 
( 

( 1 of ($x*) ) or 
( 2 of ($a*) and all of ($s*) ) 
) 
} 


What | do when | review the 20 strings that are generated by yarGen is that | try to 
categorize the extracted strings in 3 different groups: 


e Very specific strings (one of them is sufficient for successful detection, e.g. IP 
addresses, payload URLs, PDB paths, user profile directories) 

e Specific strings (strings that look good but may appear in goodware as well, e.g. 
“wwwilib.dll”) 

e Other strings (even strings that appear in goodware; without random code from 
compressed or encrypted data; e.g. “ModuleStart”) 


Then | create a condition that defines: 


e A Certain Magic Header (remove it in case of ASCII text like scripts or webshells) 
e 1 of the very specific strings OR 
e some of the specific strings combined with many (but not all) of the common strings 
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Here is another example that does only have very specific strings (x) and common strings 


(s): 


rule Cobra_Trojan_Stage1 { 
meta: 
description = "Cobra Trojan - Stage 1" 
author = "Florian Roth" 
reference = "https://blog.gdatasoftware.com/blog/article/analysis-of-project-cobra.html" 
date = "2015/02/18" 
hash = "a28164de29e51f154be12d163ce5818fceb69233" 
strings: 
$x1 = "KmSvc.DLL" fullword wide 
$x2 = "SVCHostServiceDII_W2K3.dll" fullword ascii 
$s1 = "Microsoft Corporation. All rights reserved." fullword wide 
$s2 = "srservice" fullword wide 
$s3 = "Key Management Service" fullword wide 
$s4 = "msimghlp.dll" fullword wide 
$s5 = "_ServiceCtrlHandler@16" fullword ascii 
$s6 = "ModuleStart" fullword ascii 
$s7 = "ModuleStop" fullword ascii 
$s8 = "5.2.3790.3959 (srv03.sp2.070216-1710)" fullword wide 
condition: 
uint16(0) == Ox5A4D and filesize < 50000 and 1 of ($x*) and 6 of ($s*) 
} 
If you can’t create a rule that is sufficiently specific, | recommend the following methods to 
restrict the rule: 


Magic Header (use it as first element in condition — see performance guidelines, e.g. 
“uint16(0) == 0x5A4D”) 

File Size (malware that mimics valid system files, drivers or legitimate software often 
differs significantly in size; try to find the valid files online and set a size value in your 
rule, e.g. “filesize > 200KB and filesize < 600KB") 

e String Location (see the “Location is Everything” section) 

e Exclude strings that occur in false positives (e.g. $fp1 = “McAfeeSig”) 


Location is Everything 


One of the most underestimated features of Yara is the possibility to define a range in which 


strings occur in order to match. | used this technique to create a rule that detect metasploit 
meterpreter payloads quite reliably even if it's encoded/cloaked. How that? 

If you see malware code that is hidden in an overlay at the end of a valid executable (e.g. 
“ab.exe”) and you see only strings that are typical function exports or mimics a well-known 
executable ask the following questions: 
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e ls it normal that these strings are located at this location in the file? 
e ls it normal that these strings occur more than once in that file? 
e Is the distance between two strings somehow specific? 


958586 
958596 


A FreeSid 
A AlLlLocateAndInitializeSid 
958621 A WSOCK32.dLL 
958633 A WS2_32.dLL 
958646 A WSARecy 
958656 A WSASend 
958758 WH ¥S_VYERSION_INFO 
958858 W StringFileInfo 
958886 WH 84898%b8 
958918 W Comments 


958928 W Licensed under the Apache License, Yersion 2.8 (the “License™) 
h the License. You may obtain a copy of the License at 

959272 W http://www .apache.org/Licenses/LICENSE-2.8 

959364 W Unless required by applicable Law or agreed to in writing, soft 
n an “AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either 
c Language governing permissions and Limitations under the License. 
959982 W CompanyName 


960008 
968878 
960104 
960178 
960204 
968226 
966252 ab .exe 

960274 LegalCopyr ight 


WH Apache Software Foundation 
H 
W 
H 
W 
H 
H 
W 
960304 W Copyright 2009 The Apache Software Foundation. 
W 
W 
H 
H 
H 
H 
W 
W 


FileDescription 

ApacheBench command Line utility 
FileVersion 

2.2.14 

Internal Nane 


968406 OriginalFilenane 
960440 ab.exe 

960462 ProductName 

960488 Apache HTTP Server 
960534 ProductYersion 
960564 2.2.14 

960586 YarFileInfo 

960618 Translation 


Malware Strings 


In case of the unspecific malware code in the PE overlay, try to define a rule that looks for a 
certain file size (e.g. filesize > 800KB) and the malware strings relative to the end of the file 
(e.g. $s1 in (filesize-500..filesize)). 

The following example shows a unspecified webshell that contains strings that may be 
modified by an attacker in future versions when applied in a victim’s network. Try always to 


extract strings that are less likely to be changed. 


default.aspx x index.php 


Webshell Code PHP 


The variable name “$code” is more likely to change than the function combination 
“@eval(gzinflate(base64_decode(” at the end of the file. It is possible that valid php code 
contains “eval(gzinflate(base64_decode(’” somewhere in the code but it is less likely that it 
occurs in the last 50 bytes of the file. 
| therefore wrote the following rule: 
rule Webshell_b374k_related_1 { 
meta: 

description = "Detects b374k related webshell" 

author = "Florian Roth" 

reference = "https://goo.gl/ZuzV2S" 

score = 65 

hash = "d5696b32d32177cf7/0eaaa5a28d105823526d87e20d3c62b74751 7c6d41656f7" 

date = "2015-10-17" 


strings: 

$m1 = "<?php" 

$s1 = "@eval(gzinflate(base64_decode(" ascii 
condition: 


$m1 at 0 and $s1 in (filesize-50..filesize) and filesize < 20KB 


Performance Guidelines 


| collected many ideas by Wesley Shields and Victor M. Alvarez and composed a gist called 
“Yara 

Performance Guidelines”. This guide shows you how to write Yara rules that use less CPU 
cycles by avoiding CPU intensive checks or using new condition checking shortcuts 
introduced in Yara version 3.4. 

Yara Performance Guidelines 


PE Module 
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People sometimes ask why | don’t use the PE module. The reason is simple: | avoid using 
modules that are rather new and would like to see it thoroughly tested prior using it in my 
scanners running in productive environments. It is a great module and a lot of effort went into 
it. | would always recommend using the PE module in lab environments or sandboxes. In 
scanners that walk huge directory trees a minor memory leak in one of the modules could 
lead to severe memory shortages. l'Il give it another year to prove its stability and then start 
using it in my rules. 


yarGen 


yarGen has an opcode feature since the last minor version. It is active by default but only 
useful in cases in which not enough strings could be extracted. 
| currently use the following parameters to create my rules: 


python yarGen.py --noop -z 0 -a "Florian Roth" -r "http://link-to-sample" /mal/malware 
The problem with the opcode feature is that it requires about 2,5 GB more main memory 
during rule creation. lIl change it to an optional parameter in the next version. 


yarAnalyzer 


yarAnalyzer is a rather new tool that focuses on rule coverage. After creating a bigger rule 
set or a generic rule that should match on several samples you'd like to check the coverage 
of your rules in order to detect overlapping rules (which is often OK). 

yarAnalyzer helps you to get an overview on: 


e rules that match on more than one sample 

e samples that show hits from more than one rule 
e rules without hits 

e samples without hits 
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yarAnalayzer Screenshot 


yarAnalyzer Github Repository 


String Extraction and Colorization 


To review the strings in a sample | use a simple shell one-liner that a good friend sent me 
once. 
“strings” version for Linux 


#!/bin/bash 

(strings -a -td "$@" | sed 's/4\(\s*[0-9][0-9]*\) \(.*\)$/\1 A \2/' ; strings -a -td -el "$@" | sed 's/4\ 
(\s*[0-9][0-9]*\) \(.*\)$A1 W \2/') | sort -n 

“gstrings” version for OS X (sudo port install binutils) 


#!/bin/bash 

(gstrings -a -td "$@" | gsed 's/*\(\s*[0-9][0-9]*\) \(.*\)$A1 A \2/' ; gstrings -a -td -el "$@" | gsed 
's/\(\s*[0-9][0-9]*\) \(.*\)G$/A1 W \2/) | sort -n 

It produces an output as shown in the above screenshot with green text and the description 
“Malware Strings” showing the offset, ascii (A) or wide (W) and the string at this offset. 

For a colorization of the string check my new tool “prisma” that colorizes random type 
standard output. 
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Options are: 
Usage: %s [options] [ 8//]hostnane[: ]/path 
SSL not compiled in; no support 

177 
[xs] 

rf? 
abı POST data file: %s 
abı POST data buffer 
abı POST data file (%s): %s 
abı POST data file (%s): %s 
apr_global_pool 
%d .%d%e 
ee 
x3d%c 
x3d 


KMGTPE 

%st illegal option -- %Xc 

%st option requires an argument -- %c 
CommandLineToArgvyb 

apr_initialize 

6123456789. 

8.8.8.8 

bogus %p 

I64%d 

No host data of that type was found 
Host not found 

Graceful shutdown in progress 
WSAStartup not yet called 

Winsock out of range 
Network system is unavailable 

Too many Levels of remote in path 
Stale NFS file handle 

Disc quota exceeded 

Too many 


Prisma STDOUT colorization 


Contact 


Follow me on Twitter: @Cyb3rOps 


